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FIELD OF THE INVENTION 

The present invention relates generally to data processors for high speed 
JO communication systems and networks. More particularly, the present invention relates to 
processors for real-time analysis and processing of network data. 



BACKGROUND OF THE INVENTION 

Network communication devices are, in general, protocol dependent. 
J 5 Since devices which commimicate within computer and storage Networks must strictly 
adhere to rapidly changing protocols associated with those networks, it has become clear 
that the use of protocol independent-network processors to analyze, generate and process 
traffic within these networks is of extreme practical and business importance. 



20 As such, network communication devices typically include specially designed 

protocol-specific state machines and decoder logic. Protocol-specific hardware offers the 
advantages of high performance and cost-effectiveness. However, high-speed networking 
protocol standards are in a state of flux - new protocols are emerging and changing all the 
time. Since protocol-specific hardware designs are not reusable for different protocols, 

25 major redesigning efforts are expended in producing protocol-specific hardware for these 
emerging protocols. Furthermore, protocol-specific hardware designs cannot be easily 
updgraded to include new features and fimctionality. In most cases, modifications to the 
hardware itself must be made. 



30 SUMMARY OF THE INVENTION 

An embodirnent of the present invention includes a network traffic processor. The 
processor itself is protocol independent; it does not have any hardwired logic for 
recognizing packets, fi*ames, or any other protocol-specific entities. Framing-based tasks 
are performed inside the processor using user-defined software instructions. Thus, the same 
35 processor may be used to implement network data processing systems for virtually any 
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protocol. Furthermore, new features and functionality can be easily added to the network 
traffic processor through software upgrades. As a result, the development cost of network 
data processing systems, as well as the cost of upgrading the system, can also be greatly 
reduced. 

5 The network traffic processor of the present invention is capable of synchronously 

processing and generating data for high-speed protocols (serial or otherwise), on a wire- 
speed, word-by-word basis. Significantly, the processor is capable of operating data directly 
on its input/output busses without requiring the data to be moved in and out of registers or 
internal memory units. The low overhead of operating on data directly on its input/output 
IQ busses, minimizes the total clock cycles required to process and generate each I/O data 
word. The network processor receives and transmits data on every clock, and executes 
instructions upon the same clock, eliminating the need for polling or interrupts to determine 
whether data is ready to be read or written. 

.52 15 According to an embodiment of the present invention, multiple synchronous 

H network traffic processors may be implemented in a system, in a chain mode or otherwise, 

•' j for providing a multitude of programmable functions. The synchronous network traffic 

processor may also be integrated with other hardware functions, such as other types of 

processors, memory controllers, FIFOs, etc. 

The synchronous network traffic processor, in one embodiment, has a low gate count 
W and can be easily implemented using programmable logic (e.g., FPGA). An appropriately 
i\ programmed synchronous network traffic processor may replace modules traditionally 

implemented with hard-wired logic or ASIC. 

25 

BRIEF DESCRIPTION OF THE DRAWINGS 

Additional features of the invention will be more readily apparent from the 
following detailed description and appended claims when taken in conjunction with the 
drawings, in which: 

30 Figure 1 is a block diagram illustrating the main functional units of a synchronous 

network data processor in accordance with an embodiment of the present invention. 

Figure 2 A is a block diagram illustrating an exemplary implementation of two input 
pipelines of the input pipeline unit in accordance with one embodiment of the invention. 

35 
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Figure 2B is a block diagram illustrating an exemplary implementation of two 
pass-through pipelines of the input pipeline unit in accordance with one embodiment of the 
invention. 

Figure 3A is a block diagram illustrating an exemplary implementation of the data 
compare unit in accordance with one embodiment of the invention. 
5 Figure 3B is a block diagram illustrating an exemplary implementation of the source 

select and mask unit of Figure 3 A. 

Figure 3C is a block diagram illustrating an exemplary implementation of the flag 
update of Figure 3 A. 

Figure 4 is a block diagram illustrating an exemplary implementation of the data 
10 modify unit in accordance with an embodiment of the present invention. 

Figure 5 is a block diagram illustrating an exemplary high-speed data modification 
system implemented with synchronous network data processors of the present invention. 

Figure 6 is a block diagram illustrating a general network data processing system 



implemented with synchronous network data processors of the present invention. 



I ;b3 

K.4 



15 

:;J DESCRIPTION OF THE PREFERRED EMBODIMENTS 

3 

ri ■ 

. J The present invention provides a processor for synchronously processing and 

generating data for high speed serial protocols on a word-by-word basis. In contrast to 
; conventional microprocessors, whose main focus is on register and memory operations, an 

;'^r 20 emphasis of the present invention is I/O processing. The processor of the present invention 
\^ is capable of operating directly on the data streams in its I/O busses without requiring the 
data to be moved in and out of registers or intemal memory. In addition, the processor of 
the present invention has a wide instruction set. These factors reduce the total clock cycles 
required to process and optionally modify each I/O data word. Indeed, in one embodiment 
25 of the present invention, a data word may be processed and modified in a single instruction 
clock cycle. 

Significantly, the processor of the present invention executes instructions 
synchronously with a master clock that drives the I/O busses. In one embodiment, the 
30 processor interfaces directly to the inbound serial-parallel and outbound parallel-serial 

converters of the receive and transmit serial interfaces. Words are received and transmitted 
on every clock cycle, eliminating the need for polling or interrupts to determine whether 
data is ready to be read or written. The processor does not have any hardwired logic for 
recognizing packets, frames, or any other asynchronously-arriving protocol-specific entities. 
35 The emphasis is on individual words, which arrive synchronously with instruction 
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execution. Any framing functionality is performed by software. Thus, the processor may be 
programmed to handle any network protocol. 

Figure 1 is a block diagram illustrating the main functional units of a synchronous 
network data processor 1 00 in accordance with an embodiment of the present invention. As 
5 illustrated, the synchronous network data processor 100 includes a data compare unit 1 10, a 
data modify unit 120, an execution control unit 130, a peripheral unit 140, an input pipeline 
unit 150, an instruction memory 160, and a bank of general-purpose registers 170. The 
peripheral unit 140 of the illustrated embodiment includes control signal decoders 141, 
counters 142, control registers 144, an external memory interface 146, and a local interface 
10 148. In the preferred embodiment, instruction memory 160 is a 128-word instruction 
memory, and register bank 170 includes sixteen banks of 40-bit registers. Data are 
communicated between the main functional units via 40-bit wide data paths, corresponding 
to four ten-bit undecoded input characters and four eight-bit decoded characters plus control 
^3 or status bits. Forty-bit wide data paths illustrated in Figure 1 include: PTPIPE_A, 
i 1 5 PTPIPE_B, INPIPE_A, INPIPE_B, IMMDATA_1 , IMMDATA_2, REG_RD_DATA1 , 
M REG_RD_DATA2, PERIPH_WR, DM_PERIPH_RD, DC_PERIPH_RD, and 
v^^l REG WR DATA. Also illustrated are address busses and control signal paths such as 
W PIPE_CTRL, CTRL_REG, DM_CTRL, DC_CTRL, INSTRUCTION, 
' COMPARE_FLAGS, PERIPH_FLAG, START_STOP, IWR_ADDR, IWR_DATA, 
20 DM_PERIPH_CTRL, DM_REG_CTRL, DC_PERIPH_CTRL, and DC_REG_CTRL For 
simplicity, some addresses busses and control signals are omitted in Figure 1. 

ny 

The input pipeline unit 1 50, in the present embodiment, includes four 40-bit wide by 
16-stage pipeline registers for the input busses. Two of these pipelines (INPIPE_A, 

25 INPIPE_B) feed data from input bus INO and INI to the data compare unit 110 and data 
modify unit 120; the other two pipelines (PTPIPE_A, PTPIPE_B) are used for automatic 
pass-through of data from the input busses INO and INI to output busses OUTO and OUTl 
without program intervention. The input pipeline unit 150 is driven by an externally 
generated clock signal CLK. Particularly, each pipeline of the input pipeline unit 150 is 

30 operable for receiving/outputting one word during one cycle of the clock signal CLK. The 
pipeline stages from which the outputs are taken are selectable by control signals 
PIPE_CTRL and CTRL_REG. The signal P1PE_CTRL is generated by the execution 
control unit 130 based on a currently executed instruction. The control signal CTRL_REG 
is generated by the control registers 144 based on the values stored therein by the execution 

35 control unit 130 in previous execution cycles. 
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In the present embodiment, the execution control unit 130 executes one instruction 
at every instruction cycle. Instructions are fetched and executed from the internal 
instruction memory 160. Any results the instruction generates may be used in the following 
instruction. Instruction execution may be interrupted by a trap, which can be generated 
either internally or from the extemal interrupt pins. Traps transfer control either to a fixed 
5 address or a relative offset from the current program counter (PC); the trap address, 

absolute/relative mode, and condition are all software-programmable. Every instruction 
may execute conditionally. Further, every instruction may specify up to two different 
conditional relative branches, each with its own destination address. Conditional execution 
control fields are shared with the control fields for the second branch. Therefore, if 
10 conditional execution is used the second branch must be disabled or use the same condition. 

The processor 100 can execute two types of instructions: data compare instructions 
and data modify instructions. Data compare instructions are for generating control signals 
that control the data compare unit 110; data modify instructions are for generating control 
15 signals that control the data modify imit 120 

Significantly, the execution control imit 130 is synchronous with the input pipeline 
unit 150. That is, both the execution control unit 130 and the input pipeline unit 150 are 
driven by the same externally generated clock signal CLK. During each cycle of the clock 
20 signal CLK, one data word is received by each pipeline of the input pipeline unit 150 and 
one instruction is executed by the execution control unit 130. This is significantly different 
from conventional microprocessors where data is required to be moved in and out of 
registers or internal memory and where the instruction clock is not synchronous with the I/O 
clock. 

25 

With reference still to Figure 1, the data compare unit 1 10 is operable for selectively 
performing mask/match comparisons of two instruction-specified operands during each 
instruction cycle. In the present embodiment, the instruction-specified operands may come 
from the input pipeline unit 150 (via INPIPE_A, INPrPE_B), the register bank 170 (via 

30 REG_RD_DATA2), peripheral units 140 (via DM_PERIPH_RD), and the execution control 
unit 130 (via IMMDATA_1, IMMDATA_2). The mask/match and compare operations 
performed by the data compare unit 110 are instruction-specified. In particular, the 
mask/match and compare operations performed are specified by the control signal 
DC_CTRL, which is generated by the execution control unit 130 based on the currently 

35 executed instruction. The data compare unit 110 stores the results of the mask/match 
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comparisons to a set of compare flags, which are provided to the execution control unit 130 
and peripheral unit 140 (via COMPARE_FLAGS). The set of compare flags may be used 
by the execution control unit 130 and the peripheral unit 140 in the next instruction cycle to 
conditionally branch, execute, trap, increment a counter, etc. Li the present embodiment, 
there is one compare flag for each 8-bit byte of the 40 bit input word, allowing multiple 
5 independent byte comparisons as well as whole 40-bit word comparisons in one instruction. 
Also illustrated in Figure 1 are the DC_REG_CTRL and the DC_PERIPH_CTRL signal 
paths that communicate addresses and commands from the data compare unit 1 10 to the 
register bank 170 and the peripheral unit 140, respectively. 



10 The data modify unit 120 of the present embodiment includes arithmetic logic units 

(ALUs) operable for performing arithmetic and logic operations using instruction-specified 
operands and operators. In the present embodiment, instruction-specified operands and 
operators may come from the input pipeline unit 150 (via INPIPE_A, INPIPE_B), the 
;p register bank 170 (via REG__RD_DATA1), peripheral units 140 (DMJPERIPHJRD), and 
15 the execution control unit 130 (via IMMDATA_1, IMMDATA_2). Using the 

instruction-specified operands and operators, the data modify unit 120 generates output data 
■ J words that are provided to the output busses OUTO and OUTl, the register bank 170 (via 
j REG_WR_DATA), and/or the peripheral units 140 (via PERIPH_WR). The data modify 
=i unit 120 also allows instruction-specified data to pass through unaltered to the output busses 

20 OUTO and OUTl . The modification operations performed by the data modify imit 120 are 
instruction-specified. In particular, the data modifications performed by the data modify 
unit 120 are specified by the control signal DM CTRL, which is generated by the execution 
control imit 130 according to the currently executed instruction. Also illustrated are the 
DM_REG_CTRL and the DM_PERIPH_CTRL signal paths that communicate addresses 
25 and commands from the data modify unit 120 to the register bank 170 and peripheral unit 
140, respectively. 

With reference still to Figure 1, the peripheral unit 140 includes four 20-bit counters 
142, control registers 144, an external memory/peripheral interface 146, and a local 

30 interface 148. The local interface 148 allows a host computer to download instructions to 
the instruction memory 160 via IWR_ADDR and IWR_DATA busses, and to control the 
operations of the processor 100 via START_STOP signals and PERIPH_FLAGS. Li 
addition, the control register 144 generates the CTRL_REG signal for controlling the 
operations of the pass-through pipes of the input pipeline unit 150. The local interface 148 

35 also allows the host computer to communicate with the processor 1 00 via shared mailbox 
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registers (not shown). Counters 142 that maybe cascaded to give two 40-bit counters or one 
40-bit and two 20-bit counters. Each counter 142 has an independently programmable 
increment enable, allowing it to increment in different modes: synchronously at every clock 
cycle, selectively when a register is written, or based on a mask/match of the compare flags 
generated by the data compare unit 110. Additionally, one or two counters 142 may be used 
5 as an address generator for the extemal memory/peripheral interface 146. The data modify 
unit 120 may configure the counters 142 and the control registers 144 by communicating 
appropriate data via the PERIPH WR bus. 

An Exemplary Implementation of the Input Pipeline Unit 
10 An exemplary implementation of the input pipeline unit 150 according to one 

embodiment of invention is illustrated in Figures 2 A and 2B. Figure 2 A illustrates two 
input pipelines 210 and 220, and Figure 2B illustrates two pass-through pipelines 230 and 
240. Pipelines 210, 220, 230 and 240 each includes sixteen 40-bit wide registers 214 
(herein called 16-stage pipeline registers) that are driven by the clock signal CLK. 

■ill 

;0 15 

.■J As illustrated in Figure 2 A, input pipeline 210 includes a multiplexer 212 that 

"vj selectively provides data fi-om either one of the input busses INO and INI to the 40-bit wide 

by 16-stage pipeline registers 214 according to a control signal PA_SRC provided by the 
■I control registers 144 of the peripheral unit 140. Likewise, input pipeline 220 includes a 

20 multiplexer 212 that selectively provides data from either one of the input busses INO and 

INI to the pipeline registers 214 according to a control signal PB_SRC, which is also 

provided by the control registers 144. 

In the illustrated embodiment, each stage of the pipeline registers 214 includes an 
25 output for outputting one of the input data words after a delay of a number of clock cycles 
corresponding to a position of the respective stage in the pipeline. The outputs of the 
pipelines 210 and 220 are determined by the pipeline stage select multiplexers 216, which 
select the stages fi-om which the outputs are taken. The particular stages of the pipelines 
210 and 220 fi-om which the outputs are selected are controlled by control signals 
30 PA__WORD_SEL and PB_WORD_SEL, which are generated by the execution control unit 
130 in accordance with the currently executed instruction. 

Pass-through pipelines 230 and 240 of Figure 2B are used for automatic 
pass-through of unmodified data fi-om the input busses INO and INI to the output busses 
35 OUTO and OUTl without program intervention. Similar to pipeHnes 210 and 220, each 
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stage of the pipeline registers 214 includes an output for outputting one of the input data 
words after a delay of a number of instructions cycles corresponding to a position of the 
respective stage in the pipeline. The outputs of the pipelines 230 and 240 are determined by 
the pipeline stage select multiplexers 226, which select the stages from which the outputs 
are taken. The particular stages of the pipelines 230 and 240 from which the outputs are 
5 selected are controlled by control signals P0_WORD_SEL and PI WORD SEL, which are 
provided by the control registers 144 of the peripheral unit 140. 



An Exemplary Implementation of the Data Compare Unit 

An exemplary implementation of the data compare unit 1 10 is illustrated in Figures 

IQ 3A-3C. As shown in Figure 3 A, the data compare unit 110 includes sovu-ce select and mask 
units 310, comparators 320 and flag update units 330. Each source select and mask unit 310 
is configured for receiving data from the input pipeline unit 150 (via INPIPE_A, 
INPIPE_B), the register bank 170 (via REG_RD_DATA2), the peripheral unit 140 (via 
DC_PERIPH_RD) and the execution control unit 130 (via IMMDATA_1, IMMDATA_2). 

1 5 The source select and mask units 310 perform instruction-specified masking operations on 
the data to generate masked data and comparands to be provided to the comparators 320. 
The comparators 320 perform comparisons or "matching" operations between the masked 
data and the comparands to generate match outputs, which are provided to the flag update 
units 330. The flag update units 330 in tum generate a set of compare flags DCO, DCl, 

20 DC2, DC3 and DC4 based on instruction-specified flag update modes. 

Li the present embodiment, there is one compare flag for each 8-bit byte of the 40 bit 
input word, allowing multiple independent byte comparisons as well as whole 40-bit word 
comparisons in one instruction. It should be appreciated that the data to be masked and the 
25 comparands to be generated by the source select and mask units 310 are 

instruction-specified. Specifically, each of the select and mask units 310 receives the 
control signal DC_CTRL, which is generated by the execution control unit 130 according to 
a currently executed instruction. 



30 Figure 3B illustrates an exemplary implementation of a source select and mask unit 

310 in accordance with an embodiment of the present invention. As illustrated, the source 
select and mask unit 310 includes 8-bit multiplexers 342a-342f. Although it is not 
illustrated in Figure 3B, it is appreciated that the multiplexers 342a-342f are controlled by 
the signal DC_CTRL. Thus, the sources of the data, the mask and the comparand are 

35 specified by the cvurently executed instruction. 
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It should also be noted that the data paths within the illustrated source select and 
mask unit 310 are only eight bits wide. For example, the source select and mask unit 310 
processes bit-0 to bit-7 of the 40-bit wide data. The remaining bits of the 40-bit data words 
are handled by the other source select and mask units 310 of the data modify unit 120. 

5 As illustrated, multiplexes 342a-342c each includes inputs for receiving data from 

the input pipeline unit 150 (via INPIPE_A and INPIPE_B). The output of the multiplexer 
342a is coupled to one of the inputs of multiplexer 342d, which also receives data from the 
register bank 170 (via REG_DATA2) and from the peripheral unit (via DC_PERIPH_RD). 
Thus, by applying the appropriate control signals, the output of the multiplexer 342d, which 
10 is the data to be masked, can be chosen from any one of these sources. Similarly, because 
multiplexer 342e is coupled to receive data from input pipeline unit 150 (via multiplexer 
342b), the register bank 170, or the execution control unit 130 (via DVIMDATA l), the 
output of the multiplexer 342a, which is the mask data, may be chosen from any one of 
these data soiu-ces. The outputs of multiplexer 342e-342f are coupled to an AND-gate 344, 



y 15 which performs a masking operation on the data. In the present embodiment, the 



comparand may be selected from data within the input pipeline unit 150, the register bank 
170, the peripheral unit 140 or the execution control unit 130 (via IMMDATA_2) when 
appropriate control signals are applied to multiplexers 342c and 342f. 

i 

=3 20 Figure 3C is a block diagram illustrating an exemplary flag update unit 330 in 

1 1 accordance with an embodiment of the present invention. The flag update unit 330 provides 
:3 additional programmability and flexibility to the processor 100 by allowing the instruction 
to Specify how the compare flags are updated. Particularly, as illustrated in Figure 3C, the 
flag update unit 330 includes an AND-gate 332, an OR-gate 334, and XOR-gate 336, each 
25 having an input for receiving a comparison result from a comparator 320. The outputs of 
the logic gates are coupled to inputs of multiplexer 338. Responsive to a flag update mode 
control signal generated by the execution control unit 130, the multiplexer 338 selects one 
of the outputs of AND-date 332, OR-gate 334, XOR-gate 336, or the comparison results 
from the comparator 320, to be provided to a memory element 342 (e.g., a D-flip-flop). The 
30 output of the memory element 342 is fed back to the inputs of the logic gates 332, 334 and 
336 to form feed-back loops. In this way, the flag update unit 330 updates the compare 
flags according to the instruction and according to the state of the compare flags in a 
previous instruction cycle. It should be noted that the memory element 342 is synchronous 
with the clock signal CLK that drives the input pipeline imit 150 and the execution control 

35 



9775-0042-999 



CAl -254560,3 



unit 130. Thus, the updated compare flags are provided to the execution control unit 130 
for use in the next clock cycle. 

An Exemplary Implementation of the Data Modify Unit 

Figure 4 is a block diagram illustrating an exemplary implementation of the data 
modify unit 120 in accordance with an embodiment of the present invention. According to 
the present invention, the data modify unit 120 may access any instruction-specified data 
stored within the input pipeline unit 150, and modify the instruction specified data using an 
instruction-specified operator during one instruction cycle. The data modify unit 120 may 
also allow data to pass-through without any modification. 

Particularly, as illustrated in Figure 4, the data modify unit 120 includes two 
multiplexers 410a-410b, which are operable to receive data fi-om input pipeline unit 150 
(via INPIPE_A, INPIPE_B), the register bank 170 (via REG_RD_DATA1), or the 
peripheral unit 140 (via DM_PERIPH_RD). The outputs of the multiplexers 410a-410b are 
coupled to ALUs 420a-420b, which also receive data from the execution control unit 130 as 
operands (via IMMDATA_1, IMMDATA_2). The outputs of the ALUs 420a-420b are 
provided as inputs to another ALU 420c. The outputs of the ALUs 420a-420c are also 
provided to multiplexers 430a-430b. The multiplexers 430a-430b are also coupled to 
receive data directly firom the pass-through pipelines PTPIPE_A and PTPIPE_B of the input 
pipeline unit 150. The control signals outO src and outl src, received fi:*om the control 
registers, are for selecting the inputs to the output multiplexers 430a and 430b, respectively. 
The output of the multiplexers 430a-430b are coupled to output registers 440a-440b, which 
provide data to the output busses OUTO and OUTl of the processor 100. 

According the present embodiment, the sources of the data to be modified, as well as 
the operators, are instruction-specified. Particularly, the data modify unit 120 receives the 
control signals SRC1_SEL, SRC2_SEL, opl, op2, op3 (via control signal bus DM_CTRL), 
which are generated by the execution control imit 130 according to the current instruction. 
The control signals SRC1_SEL and SRC2_SEL are for selecting the inputs of multiplexers 
410a-410b. The control signals "opl", "op2", and "op3" are for controlling the logic 
operations of ALUs 420a-420c. Thus, by using appropriate instructions, the data modify 
unit 120 may be configured for performing a variety of instruction-specified data 
modification operations during each clock cycle to generate the desired data for output. 
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Exemplary Applications of the Processor of the Present Invention 

Figure 5 is a block diagram illustrating a high-speed data modification system 520 
coupled between network devices 510 and 512. As illustrated, network devices 510 and 
512 communicate with one another via high speed communication paths 514 and 516. 
Inserted into the high speed conrununication paths 514 and 516, the data modification 
5 system 520 enables real-time system-level testing of the devices 510 and 512 by injecting 
errors into the communication paths 514 and 516, and monitoring the responses of the 
devices 510 and 512. 



As illustrated, data modification system 520 includes two trace memories 522 for 
IQ capturing the data that are conrmiunicated between the devices 510 and 5 12 for output to an 
analyzer. Additionally, data modification system 520 includes a trigger subsystem 526 and 
two data jammers 524. The trigger subsystem 526 monitors the data paths 514 and 516, 
waiting for a datum in the streams to match a predefined pattem. When the trigger 
subsystem 526 detects an input datiun matching the predefined pattem, the trigger 
;Q J 5 subsystem 526 generates a trigger signal to the data jammers 524. The data jammers 524 

respond to the trigger signal by "jamming" - altering selected portions of the input datum in 
a predefined manner in real time. 



The trigger subsystem 526 and the data jammers 524 may be implemented with the 
:: 4 20 high-speed synchronous network data processor of the present invention. Particularly, one 
synchronous network data processor 100 may be used to implement the trigger subsystem 
3 526 by loading appropriate data compare instructions and data modify instructions into the 
processor. Each of the data jammers 524 may also be implemented with a synchronous 
network data processor 1 00 by loading appropriate instructions therein. A significant 
25 advantage of using the synchronous network data processor of the present invention in the 
data modification system 520 is that the system may be re-programmed for different types of 
protocols as well as to perform different tasks. 



Application of synchronous network data processor of the present invention is not 
3Q limited to data modification systems. Figure 6 is a block diagram illustrating a general 
network data processing system 600 implemented with synchronous network data 
processors of the present invention. As shown, the general network data processing system 
600 includes four synchronous network data processor 100 interconnected by an 
interconnect fabric 670. Also interconnected by the interconnect fabric 670 are a FIFO 
35 module 61 0, a RAM module 620, a CAM module 630, I/O modules 640, a RX data path 
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650, and a TX data path 660. According to the present invention, the RX data path 650 is a 
inbound serial-to-parallel interface, and the TX data path module 660 is an outbound 
parallel-to-serial interface. The I/O modules 640 are for coupling the network data 
processing system 600 to data analyzers and other network data processing systems. 



5 Branch Control and Conditional Execution of Instructions by the Processor 

According to the present invention, the processor 100 may execute every instruction 
conditionally. Further, every instruction may specify up to two different conditional relative 
branches, each with its own destination address. Li the present embodiment, conditional 
execution control fields are shared with the control files for the second branch. If 
IQ conditional execution is used, the second branch is disabled or use the same condition. 



The bits that are examined when determining whether to conditionally branch, 
execute, or trap are referred to as the "flags," and are held in the flags register of the 
execution control unit 130. There are six flags in total, which include the five flags 
tt 1 5 generated by data compare instructions (DC4-DC0) and one programmable "P" flag 

generated by the peripheral unit 140. The "P" flag is selectable from one of several sources 
^ including counter wrap flags, the extemal memory interface ready signal, and the carry 
jfj output of the data modify imit 120. The format of the flags register is shown below in Table 
1. 

p 20 

Table 1 



Bit 
Name 


39-6 


5 


4 


3 


2 


1 


0 


Reserved 


P 


DC4 


DC3 


DC2 


DCl 


DCO 



A branch or execute condition is specified by three fields: Mask, Match, and 
True/False. Mask and Match are the same width as the flags register (40-bit), and True/False 
is a single bit. The execution control unit 130 evaluates the condition by logically ANDing 
the flags with Mask, and then comparing this result to Match. If the comparison result 
(True if equal. False if not equal) is the same as the True/False bit, the condition is 

30 

considered satisfied and the branch or conditional execution takes place. 

The branch conditions and the execution conditions of an instruction are defined by 
its common control fields. The syntax and operations of the common control fields are 
described below in Table 2. 

35 
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Table 2 



Common Control Field 


Function 


br(mask 1 , match 1 , tfl , 
addrl,mask2, match2, tf2, 
addr2) 


Conditional branch control. The two conditions are 
evaluated as described above. If condition 1 is 
satisfied, a branch is taken to addrl. Otherwise, if 
condition 2 is satisfied, a branch is taken to adcir2. 
Otherwise, control transfers to the following 
instruction. Legal values are any 6-bit constant for 
the mask and match fields, T or F for the tf field, and 
a 12-bit constant or a label (string) for addrl and 
addr2. 

The second branch condition and address may be 
omitted if not used. If no branch control field is given 
at all, control falls through to the next instruction. 

The second branch condition is shared with the 
execute condition; therefore if both conditional 
execution and the second branch are used, their 
conditions must be the same. 

When the second branch is not specified, the 
assembler encodes either an always-satisfied 
condition or the execute condition specified by the 
exec_on() field. In each case, the second branch 
target is the next instruction. When neither branch is 
specified, the assembler encodes always-satisfied 
conditions for both branches, and the next instruction 
for bntVi hranrh tarppf^ 

Address OxF80 has a special fimction when used as 
the branch 2 address. It causes a branch to the 
program counter (PC) saved by a previous subroutine 
call and is used to return fi-om the subroutine. The 
branch 2 mask/match/tf controls still fimction 
normally, allowing conditional retums. 


exec_on(mask, match, tf) 


Conditional execution control. The condition is 
evaluated as described above. If it is satisfied, the 
instruction executes; otherwise it does not execute (is 
treated as a no-op). All common control fields with 
the excention of be run are active reeardless of 
whether the instruction executes or not. 

The execute condition is shared with the second 
branch condition (see above). 

If no conditional execution control field is specified, 
the instruction executes. 
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save_pc(ctrl) 


Save the current program counter (PC). Used to 
implement subroutine calls. The Ctrl field defines 
how the PC is saved: 
0: don't save PC 

1 : store current address + 1 to saved_PC 

(subroutine returns to next instruction) 

2: store branch address 2 to saved_PC 

(subroutine returns to branch address 2. Branch 2 still 

behaves normally). 

Others: reserved 


bg_run 


When present, causes the instruction to run in the 
background (i.e., execute continuously until 
interrupted by the execution of another instruction of 
the same type). If not present, the instruction 
executes for the present instruction cycle only. Once 
an instruction is running in the background, it is no 
longer subject to any execution condition it may have 
been issued with. 

An interruption of a background-running instruction 
occurs only if the interrupting instruction actually 
executes; i.e., its execution condition is satisfied. 

While background run mode is only supported for 
data compare instructions in one preferred 
embodiment, in an altemate embodiment background 
run mode is supported for both data compare and 
data modify instmctions.. 



Some pseudo-control operations that can be implemented using the execution 
control fields are shown below in Table 3. Appropriate macros for these can be defined in a 
standard header file. Software written using the pseudo-control codes may be translated 
into the processor-specific common control fields using a pre-processor. 



Table 3 



30 



Pseudo-control 


Operation 


Implementation 


jmp 


Jump to address 
(imconditionally) 


br(0, 0, T, addr) 


jsr 


Jump to subroutine 
(unconditionally) 


br(0, 0, T, subr) save_pc(l) 


jsrr 


Jump to subroutine; 
retum to specified 
address (unconditionally) 


br(0, 0, T, subr, 0, 0, T, retaddr) 
save_pc(2) 


ret 


Retum firom subroutine 
(unconditionally) 


br(0, 0, F, 0, 0, 0, T, OxF80) 


bcs 


Branch if carry 
set 

(P = DM carry flag) 


br(0x20, 0x20, T, addr) 
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bcc 


Branch if carry clear 
(P = DM carry flag) 


br(0x20, 0x20, F, addr) 


loop 


Jump if still in loop 
(P = counter wrap flag) 


br(0x20, 0x20, F, addr) 


exec_loopend 


Execute on end of loop 
(P = counter wrap flag) 


exec_on(0x20, 0x20, T) 


br_c8t/f 
br cl6t/f 
br c24t/f 
br c32t/f 
br c40t/f 


Branch on 1 -5 byte 
comparison true/false 


br(0x01,0x01,T/F, addr) 
br(0x03, 0x03, T/F, addr) 
br(0x07, 0x07, T/F, addr) 
br(OxOf, OxOf, T/F, addr) 
br(Oxlf,Oxlf,T/F, addr) 



Data Compare Instructions Executable by the Processor 

Data compare instructions perform a three operand (data, mask, and match) 
comparison operation of up to 40 bits at a time. The sources of the data to be compared can 
be the input pipeline unit 150, the register bank 170, the peripheral unit 140, and/or the 
execution control unit 130. According to the present embodiment, the input pipelines are 
fed from the processor's input busses INO and INI, and the pipeline stage read by the 
compare instruction can be selected on the fly by the currently executed instruction. 

Data compare instructions are carried out by the data compare unit 110 which 
includes five independent 8-bit comparators 330, each of which has selectable inputs for its 
data, mask, and match values. Each comparator 330 updates its own comparison result flag, 

20 

which can be used as part of a conditional branch or execution condition. This flag can 
either be set to the comparison result, or to the logical AND, OR, or XOR of the comparison 
result and current flag value. 

The syntax of a data compare instruction executable by the processor 100 is: 

25 

compare data, mask, match [data compare specific control fields] 
[Common control fields]; 

The C-equivalent logical operation performed by a data compare instruction is 

30 

described below in Table 4. 



Table 4 



for (comp = 0; comp < 5; comp-+-f-) // do all 5 comparators 

35 ( 
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// perform 8-bit mask/match comparison 

if ( (data[comp] & mask[comp]) = match[comp] ) result[comp] = 1; 
else 

result[comp] = 0; 
// update comparison result flag (SET, AND, OR, or XOR) 
switch(update_mode) 

{ 

case SET: flag[comp] = result[comp]; break; 
case AND: flag[comp] &= result[comp]; break; 
case OR : flag[comp] |= result [comp]; break; 
case XOR: flag[comp] ^— result[comp]; break; 

} 



9 
jj 

4 



t 15 



3 20 



The compare flags are updated one clock after the instruction executes, and therefore 
may be used in the following instruction. Note that if a branch or execute condition is used 
in the same instruction as the compare, the flag values are those that existed BEFORE the 
compare instruction executes. 

Although data for the data compare instructions may come from numerous sources 
and may be specified on the fly by the currently executed instruction, there are a few 
limitations. Table 5 below shows the legal values for the three comparator source fields. 



Table 5 



Source 


Liput 
Pipeline A 


Input 
Pipeline B 


Register 
Bank 


Peripheral 
Data 


hnmediate 
data 


Mnemonic 


ina[n] 


inb[n] 


r[n] 


periph[n] 


[value] 


data 


YES 


YES 


YES 


YES 


NO 


mask 


YES 


YES 


YES 


YES 


YES 


match 


YES 


YES 


YES 


YES 


YES 



25 
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The comparator source fields are also subject to the following restrictions: 



35 
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(A) If an input pipe is used for the mask source, it may not be the same as that 
used for the data. 

(B) If the same input pipe is used in more than one source, the pipe word number 
(n) (i.e., the point at which the input pipe is tapped) must be the same in both 
uses. 

(C) If a register or peripheral is used in more than one source, the number (n) 
must be the same in both uses. The parameters of r and periph are the register 
or intemal peripheral number. Legal values for these parameters are 0-15. 

The immediate data value is a 40-bit constant specified in the instruction. Two 
different values may be specified for the mask and match fields. 

The parameters of the input pipelines specify the stage in the input pipelines from 
which data are accessed. For example, an instruction including the field "ina[4]" indicates 
using the word in the fourth stage of input pipeline INPIPE_A. Legal values for these 
parameters are 0-15. The input bus feeding each pipeline and the pipeline enables are set by 
fields in the control registers 144. 



Table 6 shows the type-specific control fields that are supported by data compare 
instructions. 

Table 6 



Control Field 


Function 


b54e_sel(c4, c3, c2, cl, cO) 


Selects the byte number of the 40-bit source word to 
apply to each comparator's data input. This field is only 
valid when using an input pipe as the data source, and 
has no effect otherwise. Legal values for c4-c0 are 4-0 
(byte 4 is the msb of the 40 bit input word, and byte 0 is 
the Isb). For the mask and match fields, or for non input 
pipe data sources, the byte number of the input word is 
the same as the comparator number; e.g., the third 
comparator uses byte 3 of the mask word. 

If this field is not given, the bj^e selects default to the 
previous values given, or 4,3,2,1,0 if no previous values 
were given. 
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updatemodeO 



Used in conjunction with the FLAG_UPD_CFG field of 
the control registers to set the flag update mode for all 
comparators. The truth table for FLAG__UPD_CFG can 
be found in Appendix-A. Legal values for mode are 0 
and 1. If this field is not given, the mode defaults to the 
previous value given, or "0"if no previous value was 
given. 



10 



Data compare instructions may be run in background mode by applying the bg_run 
common control field to the instruction. In background run mode, a data compare 
instruction runs continuously, updating the compare flags, until the next compare instruction 
executes. Normal conditional branching and execution may be performed based on the flags 
generated by the background-running instruction. 



Instruction examples illustrating both legal and illegal uses of the data compare 
instructions are illustrated below in Table 7. 

m 

xj Table 7 





Code Examples 


Description 


3 20 

%^ 


compare ina[0], Oxffffffffff, 

Ox 123456789a byte_sel(4, 3, 2, 1, 0) 

update_mode(SET); 


40-bit straight comparison of the word in 
the first stage of input pipe A to a 
constant. The word was equal to 
Ox 123456789a if all five comparator flags 
are true after the instruction executes. 


25 


compare ina[0], OxfffffffffD, 
0x1234567890; 


Same as above but with the lower 4 bits 
masked off (ignored in the comparison). 
The control fields default to the previous 
values used if not specified. 




compare ina[0], r[2], inb[8]; 


Compare the first stage of input pipe A 
with the ninth stage of input pipe B, after 
masking the data in pipe B with data in 
r[2]. 


30 


compare inb[12], r[8], periph[4]; 


Compare Pipe B stage 12 with peripheral 
4, using mask in r[8]. 


35 


compare ina[l], r[2], inb[0]; 


Compare a word in the input pipeline to 
the word received one clock ago. 
Assumes Pipes A and B both have the 
same source bus (inO or inl). (The pipe 
source busses are set by bits in 
CTRL_REG). 
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compare inb[4], ina[0], ina[0]; 


See if all the bits set in the first stage of 
input pipe A are also set in the fifth stage 
of input pipe B. 




compare inb[4], r[13], r[13]; 


Same as above, but using registers. 


5 


compare ina[0], OxOfffffffff, SOFi3 
bgrun; 


Background run example: start up the 
compare unit looking for SOFi3 in the 
input data stream, and then let other 
instructions execute. "SOFi3" is a C-style 
definition of the numeric value of a "start 
of fi-ame'* ordered set. 


10 


compare ina[3], Oxffffffffff, 

Ox 123456789a byte_sel(2, 2, 2, 2, 2); 


Byte_sel example: Compare input pipe A 
stage 3 byte 2 with five different values 
(0x12, 0x34, 0x56, 0x78, and 0x9a). The 
five flags hold the results of the five 
comparisons. 




compare ina[3], Ox73ff3f7ff8, 
0xl23456789a, byte_sel(2, 2, 2, 2, 2); 


Same as above, but with five different 
8-bit masks for the comparisons. 


15 


compare ina[3], Oxffffffffff, 
0xaal2345678 byte_sel(4, 1, 0, 1, 0); 


Compare the 16-bit word in Pipe A stage 
3 bytes 1-0 to two different values 
(0x1234 

and 0x5678), and byte 4 to Oxaa. 


20 

25 


compare ina[7], Oxffffffffff, WORD_A 

update mode(SET); 

compare ina[7], Oxffffffffff, WORD_B 

update mode(AND); 

compare ina[7], Oxffffffffff, WORD_C 

update_mode(AND); 


Update mode example: if WORD A, 
WORD_B, and WORD_C are received in 
succession. The comparison flags are set 
on the first comparison, then ANDed 
with the current flags. The pipes advance 
1 stage per instruction, so reading the 
same pipe word on successive 
instructions has the effect of reading 
successive input words. This could 
alternatively be done with conditional 
branching. If the five flags are true after 
execution of the third compare 
instruction, the three specified words 

ViSIVP ViPRTI rPPRl\/F»H in QllPPf*QQir\Tl 




compare ina[l], Oxff, ina[2]; 


Examples of illegal usages. 




compare r[2], OxfF, r[4]; 




30 


compare ina[3], periph[2], periph[3]; 






compare inb[0], inb[0], Oxff; 






compare OxfT, ina[l], r[2]; 





35 
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Data Modify Instructions Executable by the Processor 

A description of the data modify instructions executable by the processor 100 of the 
preferred embodiment follows. Data modify instructions perform arithmetic and logic 
operations using up to four operands and three operation codes (opcodes), and store the 
results to one or more write destinations. The instructions use the same sources as data 
compare instructions: the input pipeline unit 150, the register bank 170, the peripheral unit 
140, or immediate data from the execution control unit 130 as defined in the currently 
executed instruction. 



Data modify instructions are performed by the data modify unit 120, which includes 
10 three two-operand arithmetic logic units ALU1-ALU3. ALUl and ALU2 have their first 
operand (X) selectable from among the input pipeline unit 150, the register bank 170, or the 
peripheral unit 140. Their second operand (Y) is an immediate data value provided by the 
execution control unit 130 and specified in the currently executed instruction. The operands 
of ALU3 are the outputs of ALUl and ALU2. ALU3 also generates a carry flag, which can 
be selected as a source flag for conditional branching or execution. 

An optional ALU-bypass mode is available to the instructions. In the ALU-bypass 
mode, the results from ALUl and ALU2 are provided to the output busses (OUTO and 
OUTl), bypassing the ALU3. This mode allows both busses to be updated with one 
20 instruction. 

The data modify xmit 120 also supports an internal pass-through mode where data 
from the input pipeline unit 1 50 are provided directly to the output busses OUTO and 
OUTl . In this pass-through mode, "default" data can be supplied to the output busses 
25 whenever data modify instructions are not executing. The pass-through operation is 

configured by fields in the control registers 144 of the peripheral unit 140. The opcodes 
supported by data modify instructions are shown below in Table 8. Operations are shown as 
C equivalents. 



13 

?i 15 

m 
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Table 8 



Opcode 


Operation 


Description 


Supported by 
ALU's 


and 


X& Y 


Bitwise logical AND of X and Y 


1,2,3 


or 


XI Y 


Bitwise logical OR of X and Y 


1,2,3 


xor 


Y 


Bitwise logical XOR of X and Y 


1,2,3 


nor 


-(X 1 Y) 


Bitwise logical NOR of X and Y 


1,2 



35 
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rorSa 


ror(X, 8) & Y 


Rotate X right 8 bits, AND with Y 


1 


rorla 


ror(X, 1)& Y 


Rotate X right 1 bit, AND with Y 


1 


rol8a 


rol(X, 8) & Y 


Rotate X left 8 bits, AND with Y 


2 


rol2a 


rol(X, 2) & Y 


Rotate X left 2 bits, AND with Y 


2 


add 


X + Y 


Sum of X and Y 


3 


addpl 


X + Y+ 1 


Sum of X and Y, plus 1 


3 


passimm 


Y 


Pass Y (immediate data) to result 


1,2 


tbd 12 


tbd 


tbd 


1,2 


tbd3 a 


tbd 


tbd 


3 


tbdS b 


tbd 


tbd 


3 


tbd3_c 


tbd 


tbd 


3 
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Table 9 below shows pseudo-opcodes that may be implemented using the native 
opcodes. Appropriate macros for these can be defined in a standard header file. 



Table 9 





Pseudo- 
op 


vjperaiioii 


riescripiion 


iinpienieniarion 


rNote 


:H 15 


nop 


(none) 


No operation 


null = or(0, 0) 




"'4 


not 




Bitwise inverse of 
A 


XOTy/\y UXIIIIIIIIIIJ 






inc 


A+ 1 


Increment A 


add(A, 1) or addpl (A, 0) 




ill 


dec 


A- 1 


Decrement A 


add(A, Oxffffffffff) 




13 20 


sub 


A-B 


Difference of A 
andB 


addpl (A, not(B)) 




subi 


A-B 


Difference of A 
and B, B constant 


addpl (A, ~B) 




3 


neg 


-A 


Negate A 


addpl(0, not(A)) 






adc 


A + C 


Sum of A and 
carry 


add(A, 1) 

exec on(0x20,0x20,T) 


1 




sec 


C= 1 


Carry= 1 


add(l, Oxffffffffff) 




25 


clc 


C = 0 


Carry = 0 


add(0, 0) 




testge 


A>=B 


Carry = 1 if A>= 
B,Oif A<B 


null = sub(A, B) 






testnz 


A !=0 


Carry = 1 if A != 
0, OifA = 0 


null = add(A, Oxflffffiffff) 






testneg 


A<0 


Carry = 1 if A< 
0, OifA>=0 


null = add(A, 
0x8000000000) 




30 


rorS 


ror(A, 8) 


Rotate A right 8 
bits 


ror8a(A, Oxfffifffffff) 






rol8 


rol(A, 8) 


Rotate A left 8 
bits 


rol8a(A, Oxffffffffif) 






shr 


A» 1 


Shift A right 1 bit 


rorla(A, Oxeffffliffff) 






shl 


A« 1 


Shift A left 1 bit 


add(A, A) 




35 


shr8 


A»8 


Shift A right 8 
bits 


ror8a(A, OxOOfmff) 
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shl8 


A«8 


Shift A left 8 bits 


rol8a(A, OxffffffffDO) 




snm 


A XT 

A » IN 


Shift A right N 
bits(N= L.39) 


• 

(Various) 


z 


shin 


A«N 


Shift A left N bits 
(N= 1..39) 


(Various) 


2 


bset 


bset(A, N) 


Set bit N in A 


or(A, 1 « N) 




bclr 


bclr(A, N) 


Clear bit N in A 


and(A, ~(1 « N)) 




bswapOl 


bswap(0,l) 


Swap bytes 0 and 
1 in A, 
zero others 


or(ror8a(A, 
OxOOOOOOOOff), 
rol8a(A, OxOOOOOOffDO)) 




bswapl2 


bswap(l,2) 


Swap bytes 1 and 
2 in A, 
zero others 


or(ror8a(A, 
OxOOOOOOffDO), 
rol8a(A, OxOOOOfTOOOO)) 




bswap23 


bswap(2,3) 


Swap bytes 2 and 
3 in A, 
zero others 


or(ror8a(A, 
OxOOOOfTOOOO), 
rol8a(A, OxOOffOOOOOO)) 




bswap34 


bswap(3,4) 


Swap bytes 3 and 
4 in A, 
zero others 


or(ror8a(A, 
OxOOffOOOOOO), 
rol8a(A, OxffOOOOOOOO)) 





Notes: 

15 (1) Assumes P flag is programmed to be the ALU3 carry flag. See the PERIPH_CTRL 
register. 

(2) Can be implemented with multi-instruction macros using rorla, ror8a, rol2a, and 
rol8a opcodes. Worst case N requires 5 instructions. 

20 Data modify instmctions write their results to one or more of the following write 

destinations: either of the two output busses OUTO and OUTl, the register bank 170, or the 
peripheral unit 140. 

The syntax of the data modify instructions in normal mode is: 

25 

destl [,dest2..,] - op3(opl(srcl, imml), op2(src2, imm2)) [Common control 
fields]; 

ALU3 bypass mode is specified by assigning one or more of the output busses to the 
30 ALUl or ALU2 results, using the following syntax. 

destl [,dst2...] = op3(outO = opl(srcl, imml), op2(src2, imm2)) [Common 
control fields]; 

destl [,dest2...] = op3(opl(srcl, imml), outl = op2(src2, imin2)) [Common 
35 control fields]; 
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destl [,dest2...] = op3(outO = opl(srcl, imml), outl = op2(src2, imm2)) 
[Common control fields]; 

The first syntax places outO in bypass mode. The second syntax places outl in 
bypass mode, and the third places both outputs in bypass mode. When an output is in bypass 
5 mode, it is illegal to also use it as an ALU3 destination. 

The operation codes opl-op3 are for ALUs 420a-420c, respectively; srcl and src2 
are the selectable source fields for ALU 420a and ALU 420b, and imml and imm2 are the 
two 40-bit immediate data values. The C-equivalent logic operation performed by a data 
10 modify instruction is illustrated below in Table 10. 

Table 10 

result 1 = alul2 operation(opl, srcl, imml); 
result2 = alul2_operation(op2, src2, imm2); 
if (outO_bypass) 

outO = result 1; 
if (outl_bypass) 

outl = result2; 
dest(s) = alu3_operation(op3, resultl, result2); 

Q 20 

Additionally, the ALU3 carry flag is updated if the ALU3 opcode is "add" or 
^^3 "addpl" (other opcodes and DC instmctions do not change the carry flag value). The carry 
is set if the addition overflowed, and cleared otherwise. In addition to arithmetic operations, 
the carry flag (not shown) can be used as a general-purpose branch and execute control flag. 

25 

Table 1 1 below shows the legal sources for the source (srcl and src2) and 
destination (dest) fields of a data modify instruction. Note that null can be specified for 
dest, in which case the ALU3 result is ignored. The immediate data operands (imml and 
imm2) are 40-bit constants specified in the instruction. 

30 

Table 1 1 



35 



Source/ 


Input 0 


Input 1 


Register 


Peripheral 


Output 


Output 


None 


Dest 


Pipeline 


Pipeline 


Bank 


Data 


Bus 


Bus 




Mnemonic 


inO[n] 


inl[n] 


rrnl 


periph[n] 


outO 


outl 


null 


srcl 


YES 


YES 


YES 


YES 


NO 


NO 


NO 


src2 


NO 


YES 


YES 


NO 


NO 


NO 


NO 
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dest NO 


NO 


YES 


YES YES 


YES YES 


The parameters of r 


and periph are the register or internal peripheral number. Legal 



values for these parameters are 0-15. 



The parameters of inO and inl are the word in the input pipeline register to operate 
on. For example, in0[4] means use the word in stage 4 of the input 0 pipeline. Legal values 
for these parameters are 0-15. 

Li the present embodiment, the source and destination fields are subject to the 
following additional restrictions: 

(A) If the same input pipe is used in more than one source, the pipe word number 
(n) must be the same in both uses. 

(B) If two registers are used as sources and a register is also used as a 
destination, the register number (n) of one of the source registers must be the 
same as that of the destination register. 

(C) If a peripheral is used in more than one source, the number (n) must be the 
same in both uses. 

(D) If both a register and peripheral are used as destinations, the number (n) must 
be the same in both uses. 

(E) No more than one register may be used as a destination. 

(F) No more than one peripheral may be used as a destination. 

Table 12 below illustrates some exemplary usages of the data modify instmctions. 



Table 12 
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Code Examples 


Description 


outO = inO[0]; 


Pass-through data. 


outl =r[4]; 


Output data fi-om register. 


out0 = 0x08BCB51717; 


Send an SOF (Start of Frame). 


r[0] = 0x12345678; 


Initialize register to constant. 


r[l]=r[0]; 


Move register to register. 


r[2] =periph[3]; 


Move peripheral value to register (save 
DC flags). 


periph[3] = r[2]; 


Move register to peripheral. 
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rf'^l — infin 1- 

HJj inu[ij. 


lV/frt\/^* iriTMif \/ oil If* fn "r**cnct^T' 
IVIUVC liipUL VctlLiC IKj iC^loLd. 




penpn|_ 1 1 j — uxdii. 


OIC/IC dJlialcUil IKJ UCXipiidal. 




FLU J r[uj. 






r[0]=add(r[0],r[l]); 


Add register to register. 


5 


outl, r[6] = 0x0123456789; 


set output and register to 40 bit constant 




outO, outl, r[12] = periph[3]; 


set both outputs and register to peripheral 

A/^Q 111 




oulu, ouii, r[^jj, pcnpn[-?j — iniL-?j, 


IV^iil ti"r\l^ Hi^cf iTioti r\Tic 
iVlUllipiC UColiildllLlilo. 


10 


r[0] = or(outO = 1, outl = 2) 


ALU-3 bypass mode. 


null = or(outO = 1, outl = 2) 


ALU-3 results ignored. 




outO = or(r[2], periph[3]); 


Logical OR of register and peripheral 

VdiLiC 


''•^ 


oulu — xor^inuLuj, i 




0 15 


r[3] = and(inl[6], Oxffff); 


Store lower 16 bits of input to r[3] 




r[7] = add(r[7], 1); 


increment r[7] 


11 


outO = or(and(inl [4], OxfffffTOO), 0x8b); 


output = input with byte 0 changed to 
0x8b 


320 

••J 3 


outO, outl, r[3], periph[3] = 
addpl(xor(in0[8], 0xl23456789a), 
or(periph[2], 0xfedcba9876)); 


Example of complex data modify 
instruction. 


25 


r[3], periph[3] = addpl(outO = xor(in0[8], 
Ox 123456789a), outl = or(periph[2], 
0xfedcba9876)); 


With ALU3 bypass mode on both outputs 




r[3], periph[3], outl = addpl(out0 = 
xor(inO[8], Ox 123456789a), or(periph[2], 
0xfedcba9876)); 


With ALU3 bypass mode on OUTO only 


30 


r[3], periph[3], outO = addpl(xor(inO[8], 
Ox 123456789a), outl = or(periph[2], 
Oxfedcba9876)); 


With ALU3 bypass mode on OUTl only 




outO = or(inO[l], in0[2]); 


Examples of illegal usage 




r[0] = and(r[l], r[2]); 
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r[0] = add(periph[0], periph[l]); 




r[0],periph[l]=2; 




r[0],r[l] = 0; 




periph[0], periph[l] = r[6]; 





Peripheral Unit and Control Registers 

The peripheral unit 140 is accessed via a set of registers referenced by the 
instructions as periph[n]. The peripheral unit 140 is divided into a number of subunits, 
which are described in more detail below. Table 13 below shows the address map of the 
subunits and registers in the peripheral unit. 



Table 13 



15 



20 



Register Name 


Address 


Description 


Subunit 


Write 


EXT_WR_DATA 


periph[0] 


J_/ALClllcll iviciinjiy 

hiterface write data 
with normal 
addressing 


Extemal 
Memory 
Interface Unit 


W 


EXT_RD_DATA 


periph[0] 


Interface read data 
with normal 
addressing 


Extemal 
Memory 
Interface Unit 


R 


MAILBOX_W 


periph[l] 


Mailbox Register to 
host 


Local Interface 
Unit 


W 


MAILBOX_R 


periph[l] 


Mailbox Register 
from host 


Local Interface 
Unit 


R 


CTR_32 


periph[3] 


Counter 3 (upper 
20) and Counter 2 
(lower 20 bits) 


Counter Unit 


R 


CTR_INC 


periph[3] 


Counter Increment 
register 


Counter Unit 


W 


ENG CTRL 


periphr41 


Control Register 


[Global! 


W 


TRAP_CTRL 


periph[5] 


Trap Control 
Register 


Trap Unit 


w 


CTR_DATA 


periph[6] 


Counter Data 
register 


Counter Unit 


w 


PERIPH_CTRL 


periph[7] 


Peripheral Control 
register 


[Global] 


w 


EXT_WR_DATA_I 


periph[8] 


Extemal Memory 
Interface write data 
with ALU2 indexed 
addressing 


Extemal 
Memory 
Interface Unit 


w 
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EXT_RD_DATAJ 


periph[8] 


External Memory 
Interface read data 
with ALU2 indexed 
addressing 


x2fXlCnial 

Memory 
Interface Unit 


R 


RESERVED 


others 


Reserved 







The format of the peripheral subunits are described in Appendix- A. 

5 

Alternate Embodiments 

While the present invention has been described with reference to a few specific 
embodiments, the description is illustrative of the invention and is not to be construed as 
limiting the invention. Various modifications may occur to those skilled in the art without 
10 departing fi-om the true spirit and scope of the invention as defined by the claims below. 



15 



m 



20 
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APPENDIX A 

Peripheral Register Formats 



EXT_WR_DATA - External Memory Interface Write Data - Write Only 



5 



3 20 



Field Name 


Bits 


Function 


data 


39-0 


This value is written to the external 
memory interface write data bus. Writing 
this value also causes the interface chip 
select and write strobe to be asserted. The 
address presented to the external memory 
interface during the write is the 
concatenated value of Counter 3 (upper 2 0 
bits) and Counter2 (lower 20 bits) ) . 

The instruction writing the memory 
interface does not stall due to a 
deasserted interface RDY signal; instead, 
this signal can be used as part of a 
branch/execute/trap condition to provide 
software-based wait states (during which 
other useful instructions may execute) . The 
write value has not necessarily been 
accepted by the external memory until it 
asserts RDY. 



EXT_WR_DATA_I - External Memory Interface Write Data with ALU2 Indexed 
Addressing- Write Only 



25 



Field Name 


Bits 


Function 


data 


39-0 


This register functions equivalently to the 
EXT_WR_DATA register, except that the 
address presented to the external memory 
interface is Counter32 + the ALU2 result. 



30 
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EXT_RD_DATA - External Memory Interface Read Data - Read Only 



5 



Field Name 


Bits 


Function 


data 


39-0 


This value is read from the external memory 
interface read data bus. Reading this value 
also causes the interface chip select and 
read strobe to be asserted. The address 
presented to the external memory interface 
during the read is the concatenated value 
of Counter 3 (upper 2 0 bits) and Counter 2 
(lower 20 bits) . 

The instruction reading the memory 
interface does not stall due to a 
deasserted interface RDY signal; instead, 
this signal can be used as part of a 

software-based wait states (during which 
other useful instructions may execute) . The 
read value is not necessarily valid until 
the external memory asserts RDY. 



EXT_RD_DATA_I - External Memory Interface Read Data with ALU2 Indexed 
Addressing- Read Only 



Field Name 


Bits 


Function 


data 


39-0 


This register functions equivalently to the 
EXT_RD_DATA register, except that the 
address presented to the external memory 
interface is Counter32 + the AIjU2 result . 



MAILBOX_W - Mailbox Register to Host - Write Only (Processor), Read Only (Host) 



Field Name 


Bits 


Function 


res 


39- 
32 


Reserved, write 0 


data 


31-0 


Mailbox register value. This value is 
writeable by the PicoEngine and readable by 
the host CPU for communication between the 
PicoEngine and host . The data contained in 
this register is application-dependent. 
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MAILBOX_R - Mailbox Register from Host - Read Only (Processor), Write Only 
(Host) 



Field Name 


Bits 


Function 


res 


39- 
32 


Reserved, write 0 


data 


31-0 


Mailbox register value. This value is 
readable by the PicoEngine and writeable by 
the host CPU for communication between the 
PicoEngine and host. The data contained in 
this register is application-dependent. 



CTR_32 - Counter 32 Register - Read 



Field Name 


Bits 


Function 


counter3 


39-20 


Value of counter 3, also used for external 
memory address high bits. 


counter2 


19-0 


Value of counter 2, also used for external 
memory address low bits. 




CTRJNC - Counter Increment Register - Write Only 



Field Name 


Bits 


Function 


X 


39-0 


Writing this register increments any 
counter programmed to increment on a write 
to CTR_INC (as determined by the 
ctr*_inc_on_wr bits in the PERIPH_CTRL 
register) . The value written is 
irrelevant . 



CTR_DATA - Counter Data Register - Write Only 



30 



Field Name 


Bits 


Function 


ctr_31 


39-20 


This data is written to counters 3 and 1 
when those counters are enabled by the 
corresponding ctr_wren bits in the 
PERIPH_CTRL register. 


ctr_2 0 


19-0 


This data is written to counters 2 and 0 
when those counters are enabled by the 
corresponding ctr_wren bits in the 
PERIPH CTRL register. 
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ENG_CTRL - Control Register - Write Only 





Field Name 


Bits 


Function 




res 


39- 
38 


Reserved, write 0 








Register bank read enable. Selects which 








register bank will be read when a register 


5 






(r [0] through r [15] ) is used as a source in 
Data Compare or Data Modify instructions. 
Each bank includes 16 independent 
registers . Background- running instructions 
read from the bank that was active at the 




i~ea bank 
_ren 


37- 
36 


time the background- running instruction was 
issued. [Note: Engines currently only 




support Bank 0 unless specially configured 
during hardware synthesis. Ask PG if in 








doubt] . 


15 15 






11 : Bank 3 
10 : Bank 2 
01: Bank 1 
00: Bank 0 








Write enable bits for the four register 








banks. Selects which banks will be written 


20 

iiy 






when the Data Modify unit writes a register 
(r [0] through r[15] ) . Each bank includes 16 
independent registers. More than one bank 








may be written simultaneously. [Note: 
Engines currently only support Bank 0 


25 






unless specially configured during hardware 


T~ea bank 
_wen 


35- 
32 


synthesis. Ask PG if in doubt] . 

Ixxx: Enable bank 3 for write; Oxxx: 
disable 

xlxx: Enable bank 2 for write; xOxx: 


30 






c\ 1 C!P^bl 

xxlx: Enable bank 1 for write; xxOx: 
disable 

xxxl : Enable bank 0 for write; xxxO : 
disable 
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Output bus 1 update enable. When this bit 








is 1, the output bus is in passthrough mode 




out l_en 


31 


and passes data firom its default source 
whenever the bus is not being written by a 
Data Modify instruction. When 0, the bus 
holds its previous value. 






-J \j 










Selects the default source for output bus 








1 . The data from this source is passed to 








the output bus whenever a Data Modify 








instruction isn't updating the bus, and the 


10 


outl src 


29 


bus update enable (outl_en) is 1. The 
values for src are : 

0: input bus 0 passthrough pipeline 
1 : input bus 1 passthrough pipeline 
The number of clocks of input to output 
delay is set by the pi word sel field. 


15 


outO src 


28 


Same as above, for output bus 0. 






Word select for the inl to output bus 
passthrough pipeline. This gives the number 




pl_word_ 
sel 


27- 


of clocks (equal to pl_word_sel +2) of 
delay between input bus 1 and the output 




24 


bus in passthrough mode. An output bus is 


20 






m passunrougn moae wnenever lu xsn u oemg 






updated by a DM instruction, and its out en 
field is 1 . 




pU WOjTCI 


A ^ — 


oame xuncuionaxi cy as ajDove , ror cne inu no 




sel 


20 


output bus passthrough pipeline. 








DC instruction compare flag update control. 








Used in conjunction with the DC control 






field f lag_update ( ) to set the compare flag 
update mode as follows: 




f lag_upd 


19 






_cfg 


flaQ uiDd cfQ UDdate Update mode 
0 0 SET 


30 






0 1 AND 

1 0 OR 
1 1 XOR 
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10 
15 



comp_mod 
e 


18- 
14 


Selects the comparator mode (0 = equality, 
1= magnitude) for each DC comparator. In 
equality mode, the comparator result is 1 
if (data & mask) == match, otherwise 0. In 
magnitude mode, the result is 1 if (data & 
mask) >= match, otherwise 0. 

[Magnitude mode issues and description] 


pb_en 


13 


Enable for Data Compare input pipeline B. 
0: disable pipeline (does not advance) 
1: enable pipeline (advances 1 word per 
instruction) 


pb_src 


12-8 


Source bus for Data Compare input pipeline 
B (one bit per input bus byte) . 
0 : input bus 0 
1 : input bus 1 


res 


7-6 


Reserved, write 0 


pa en 


5 


Enable for Data Compare input pipeline A. 
0: disable pipeline (does not advance) 
1: enable pipeline (advances 1 word per 
instruction) 


pa_src 


4-0 


Source bus for Data Compare input pipeline 
A (one bit per input bus byte) . 
0 : input bus 0 
1 : input bus 1 



25 
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TRAP_CTRL- Trap Control Register - Write Only 



10 



25 



30 



Field Name 


Bits 


Function 


res 


39- 
32 


Reserved, write 0 






Trap relative address enable. When 1^ 






trap addr is treated as a sign-extended 


trap_rel 
ative 


31 


trap causes control to transfer to the PC + 
trap_addr. When 0, trap_addr is treated as 
an absolute address; a trap causes control 
to transfer to trap addr. 








trap_res 
tore 


30 


State of the trap_en bit after a return 
from the trap routine. Otherwise, trap_en 
remains disabled after the return from the 
trap rout ine . 






Trap enable. Enables traps when 1, disables 






them when 0 . When the trap is enabled and 






its match/mask/tf condition is satisfied. 






control transfers to the target address 






specified by the trap_addr and 






trap_relative fields. 


trap_en 


29 


Trap__en is cleared upon entry to the trap 
routine, thus disabling further traps. If 
trap restore is set, the bit will be 
restored to its value before the trap upon 

XeCLLXil X XOiri COe L,Xa.p XOIXL-XIIc VWIIXL-II OC-L-U.XS 

Via a branch to the saved PC) . However, xf 
software writes this bit before the trap 
routine returns, the bit written will be 
preserved upon the return. 






X X KJIL UtCl L. V_ll/ I Ilea. OiV t^XLlC/ Xd-LOO. l^C: l_ J- III J- 1 o 






whether trap should be taken if its 


trap_f 


28 


match/mask condition is true (trap_f = 0) 
or false 
(trap f = 1) . 
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Trap condition match bits. These bits 










specify the trap condition in the same 










manner as the branch/execute condition 










bits . 




trap 


mat 


27- 


bits 27-26 : match bits for external 




ch 




20 


interrupts 1-0 respectively 


5 








bit 25 : match bit for the Peripheral 

-FT 

J. xay 

bits 24-20 : match bits for Data Compare 
flags 4-0 respectively 










Trap condition mask bits. These bits 


10 








specify the trap condition in the same 








manner as the branch/execute condition 
bits . 




trap_ 


jnas 


19- 


bits 19-18 : mask bits for external 




k 




12 


interrupts 1-0 respectively 

bit 17 : mask bit for the Peripheral 


15 








flag 

bits 16-12 : mask bits for Data Compare 
flags 4-0 respectively 




res 


11- 
10 


Reserved, write 0 










Trap destination address. 


20 








Holds the target address for traps. Control 

is transferred to trap_addr (if 

trap relative = 0) or the current PC + 




trap 
r 


add 


9-0 


trap addr (if trap_relative = 1) when traps 
are enabled and the trap match/mask/tf 
condition is satisfied. Indirect branching 


25 








may be implemented by writing the target 
address to this field and trapping on an 
always-satisfied condition. 



PERIPH_CTRL - Peripheral Control Register - Write Only 



30 



35 



Field Name 


Bits 


Function 


res 


39 


ReseDTved, write 0 


ct_f 


38 


Count on match/mask true/false. Determines 
whether counting should occur if the 
match/mask condition is true (ct_f = 0) or 
false 

(ct f = 1) . 
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ct mask 



37- 
32 



Count enable condition mask bits. These 
bits specify the count condition (when 
count enable on match/mask/ tf is configured 
by ctr*_ie_sel) in the same manner as the 
branch/execute condition bits, 
bit 37 : mask bit for the Peripheral flag 
bits 36-32 : mask bits for Data Compare 
flags 4-0 respectively 



pf _en_hi 



31- 
30 



(See pf_en) 



ct match 



29- 
24 



Count enable condition match bits. These 

bits specify the count condition (when 

count enable on match/mask/tf is configured 

by ctr*_ie_sel) in the same manner as the 

branch/execute condition bits. 

bit 2 9 : match bit for the Peripheral 

flag 

bits 28-24 : match bits for Data Compare 
flags 4-0 respectively 



Counter write enables. These bits enable 
one or more of the counters ^f or writing 



ctr wren 



23- 
20 



when the CTR_DATA register is written, 
bit 23: 1 = enable write to counter 3, 
disable 
bit 22: 
disable 
bit 21: 
disable 
bit 20: 
disable 



0 = 



0 = 



= enable write to counter 2, 
= enable write to counter 1, 0 = 
= enable write to counter 0, 0 = 
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5 



10 



pf_en 


19- 
16 


Peripheral flag enable bits, used in 
combination with pf __en_hi . Selects the 
source (s) of the Peripheral flag (the P bit 
of the Flags register) used in branch, 
execute, trap, and count conditions. All 
sources with an enable bit of 1 are 
logically ANDed to generate the P bit; 
sources with an enable bit of 0 are 
ignored . 

pf en hi, pf en, source: 

Ix xxxx: Data Modify unit ALU3 carry flag 

xl xxxx: EXT_RDY (ready flag) signal from 

External Memory Interface 

XX Ixxx: Counter 3 wrap flag; 1 when 

counter 3 wraps from Oxfffff to 0 

XX xlxx: Counter 2 wrap flag; 1 when 

counter 2 wraps from Oxfffff to 0 

XX xxlx: Counter 1 wrap flag; 1 when 

counter 1 wraps from Oxfffff to 0 

XX xxxl : Counter 0 wrap flag; 1 when 

counter 0 wraps from Oxfffff to 0 

Note: each counter wrap flag maintains its 
state until the counter is next updated, 
either by an increment or software write . 
Software writes to the CTR_DATA register 
reset the wrap flags of any counters 
written to. 


ctr3_inc 
on wr 


15 


Counter 3 increment enable on peripheral 
register write. If this bit is 1, counter 3 
will be incremented on any write to the 
CTR_INC register as well as any conditions 
generated due to the ctr3 ie sel bits. If 
this bit is 0 or whenever CTR_INC is not 
written, counting is controlled by the 
ctr3 ie sel bits. 
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5 



10 







Counter 3 default increment enable bits. 






Selects the condition for incrementing 






counter 3 . 






Ill: increment when previous counter wraps 






(cascade with previous) 


ctr3_ie_ 


14- 


110: increment always 


sel 


12 


100: increment when counter mask/match/tf 
condition is satisfied 

000 : increment on external memory interface 

autoincrement ) 
others : reserved 


ctr2 inc 


11 


Same functionality as ctr3 inc on wr, for 


on WIT 


counter 2 . 


ctr2_ie_ 
sel 


10-8 


OdUl^ X LlXlt^ L, X CJllctX X U y 0.0 O U. X O X cr oCX / XQJX 

counter 2, with the following exception: 
0111: don ' t increment 


Ctrl inc 


7 


Same functionality as ctr3 inc on wr, for 


on wr 




counter 1 . 


Ctrl ie 
sel 


6-4 


Same functionality as ctr3 ie sel, for 
counter 1 . 


ctrO_inc 


3 


Same functionality as ctr3_inc_on_wr , for 


on wr 




counter 0 . 


ctrO_ie_ 
sel 


2-0 


Same functionality as ctr3 ie sel, for 
counter 0, with the following exception: 
111: don't increment 
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